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Abstract 

Consider a class Ti of binary functions h : X ^ {—1, +1} on a finite interval X = 
[0, B] C IR. Define the sample width of /i on a finite subset (a sample) S d X bs u)s{h) = 
minxes \^h{x)\ where uJh{x) = h{x) max{a > : h{z) = h(x), x — a < z < x + a}. Let Se 
be the space of all samples in X of cardinality £ and consider sets of wide samples, i.e., 
hypersets which are defined as A/^^h = {S £ Si : u)s{h) > /?}. Through an application 
of the Sauer-Shelah result on the density of sets an upper estimate is obtained on the 
growth function (or trace) of the class {A^^h ■ h G TL\, /? > 0, i.e., on the number 
of possible dichotomies obtained by intersecting all hypersets with a fixed collection of 
samples 5 e of cardinality m. The estimate is 2 ^^^q^*-^'^^-' i^J^)- 
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1 Overview 

Let S > and define the domain as X = [0,5]. In this paper we consider the class Ti 
of all binary functions h : X {— 1,+1} which have only simple discontinuities, i.e., at 
any point x the limits h{x'^) = ]im.^^r^+ h{z) from the right and similarly from the left 
h{x~) exist (but are not necessarily equal). A main theme of our recent work has been 
to characterize binary functions based on their behavior on a finite subset of X. In ? 
we showed that the problem of learning binary functions from a finite labeled sample can 
improve the generalization error-bounds if the learner obtains a hypothesis which in addition 
to minimizing the empirical sample-error is also 'smooth' around elements of the sample. 
This notion of smoothness (used also in ??) is based on the simple notion of width of h at 
X which is defined as 

ujh{x) = h{x) max{a > : h{z) = h{x),x — a < z < x + a}. 

For a finite subset (also called sample) S C X the sample width of h denoted uJs{h) is 
defined as 

us{h) = uim\uJh{x)\. 



This definition of width resembles the notion of sample margin of a real- valued function / 
(see for instance ?). We say that a sample S is wide for h if the width us{h) is large. Wide 
samples implicitly contain more side information for instance about a learning problem. The 
current paper aims at estimating the complexity of the class of wide samples for functions in 
7i. This complexity is related to a notion of description complexity and knowing it enables 
to compute the efficiency of information that is implicit in samples for learning (see ?). 



2 Introduction 

For any logical expression A denote by ^{A} the indicator function which takes the value 
1 or whenever the statement A is true or false, respectively. Let f. be any fixed positive 
integer and define the space §^ of all samples S d X oi size I. On consider sets of wide 
samples, i.e., 

Ap^h = {Se^r- ^s{h) > /?}, /? > 0. 

We refer to such sets as hypersets. It will be convenient to associate with these sets the 
indicator functions on §^ which are denoted as 

These are referred to as hyperconcepts and we may write h' for brevity. For any fixed width 
parameter 7 > define the hyperclass 

n'^ = {h'^^h -.hen}. (1) 

In words, H'^ consists of all sets of subsets S C X of cardinality £ on which the corresponding 
binary functions h are wide by at least 7. 

The aim of the paper is to compute the complexity of the hyperclass TC[, that corresponds 
to the class 7i. Since the domain X is infinite then so is 7i'^ hence one cannot simply measure 
its cardinality. Instead we apply a standard combinatorial measure of the complexity of a 
family of sets as follows: suppose y is a general domain and Q is an infinite class of subsets 
of Y. For any subset S = {yi, . . . , y„} C Y let 

rgiS) ^ \G\s\ (2) 

where Q\s = {[^aiyi), ■ ■ ■ ,^G{yn)] '■ G G Q}. The growth function (see for instance ?) is 
defined as 

rain) = max FaiS). 

{S:ScY,\S\=n} 

It measures the rate in which the number of dichotomies obtained by intersecting subsets 
G of ^ with a finite set S increases as a function of the cardinality n of 5 in the maximal 
case (it is also called the trace of ^ in ?). 

Since we are interested in hypersets as opposed to simple sets G (as above) then we 
consider the trace on a finite collection ^ C §£ of samples (instead of a finite sample S as 
above) . It will be convenient to define the cardinality of such a collection as the cardinality 
of the union of its component sets, i.e., for any given finite collection C C let 

ici= U ^ (3) 
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and we use m to denote a possible value of As a measure of complexity of Ti'^ we 
compute the growth as a function of m, i.e. 

3 Main result 

Let us state the main result of the paper. 

Theorem 1 Let i,m > be finite integers and B > a finite real number. Let TC be the 
class of binary functions on [0,B] (with only simple discontinuities). For a given width 
parameter value 7 > 0, the corresponding hyperclass Ti'^ on the space has a growth which 
is bounded as 

2lB/(2j)\ 



rn',{m)<2 Yl 

i=0 



m — 
i 



Remark 1 For m > £ + B/^, the following simpler bound holds 



Fndm)<2 



B 



B~ 



Before proving this result we need some additional notation. We denote by (a, 6) a 
generalized interval set of the form [a, b], (a, b), [a, b) or (a, b]. For a set R we write Ir{x) to 
represent the indicator function of the statement a; € -R. In case of an interval set R = (a, b) 
we write I{a, b). 

Proof. Any binary function h may be represented by thresholding a real-valued function / 
on X, i.e., h{x) = sgn(/(x)) where for any a G R, sgn(a) = +1 or — 1 if a > or a < 0, 
respectively. The idea is to choose a class JT of real-valued functions / which is rich enough 
(it has to be infinite since there are infinitely many binary functions on X) but is as simple 
as we can find. This is important since, as we will show, the growth function of H'^y is 
bounded from above by the complexity of a class that is a variant of JT. 

We start by constructing such an J^. For a binary function h on X consider the cor- 
responding set sequence {i?i}i=i,2,... which satisfies the following properties: (a) [0,B] = 
Ui=i 2 "i 3, Rj = 0) (b) h alternates in sign over consecutive sets 

(c) Ri is an interval set (a, b) with possibly a = 6 (in which case Ri = {a}). Hence 
h has the following general form 

h{x) = ± Yl (4) 

1=1,2,..., 

Thus there are exactly two functions h corresponding uniquely to each sequence of sets Ri, 

i = 1,2, Unless explicitly specified, the end points X = [0,-B] are not considered 

roots of /i, i.e., the default behavior is that outside X, i.e., a; < or a; > S, the function 
'continues' with the same value it takes at the endpoint /i(0) or h{B), respectively. Now, 
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associate with the set sequence Ri,R2,... the unique non-decreasing sequence of right- 
endpoints ai, 02, . . . which define these sets (the sequence may have up to two consecutive 
repetitions except for and B) according to 



Ri 



{ai-i,ai), i = 1,2, 



(5) 



with the first left end point being ao = 0. Note that different choices for ( and ) (see 
earher definition of a generahzed interval {a,b)) give different sets Ri and hence different 
functions h. For instance, suppose X = [0, 7] then the following set sequence Ri = [0,2.4), 
R2 = [2.4, 3.6), R3 = [3.6, 3.6] = {3.6}, R4 = (3.6, 7] has a corresponding end-point sequence 
ai = 2.4, a2 = 3.6, 03 = 3.6, 04 = 7. Note that a singleton set introduces a repeated value in 
this sequence. As another example consider Ri = [0,0] = {0}, R2 = (0,4.1), R3 = [4.1,7] 
with ai = 0, a2 = 4.1, 03 = 7. 

Next, define the corresponding sequence of midpoints 

lii = , 1 = 1,2,.... 

Define the continuous real-valued function f : X ^ B] that corresponds to h (via the 
end-point sequence) as follows: 



f{x) = ± {-ly+Hx-a,)I[^^,.l,^l,] (6) 

1=1,2,... 

where we take ^uq = (see for instance. Figure [T]). Clearly, the value f{x) equals the width 




Figure 1: h (solid) and its corresponding / (dashed) on X = [0, B] with B = 800 



coh{x). Note that for a fixed sequence of endpoints ai, i = 1,2, .. . the function / is invariant 
to the type of intervals Ri = (aj_i,aj) that h has, for instance, the set sequence [0,ai), 
[01,02), [02,03], (o3,i?] and the sequence [0,ai], (01,02], (02,03], (03,-6] yield different 
binary functions h but the same width function /. For convenience, when h has a finite 
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number n of interval sets Ri, then the sum in Q has an upper hmit of n and we define 
Qn = B. Similarly, the sum in ([6| goes up to n — 1 and we define fin-i = B. Let us denote 
by 

^+ = {|/| :/G^}. (7) 

It follows that the hyperclass 7i'^ may be represented in terms of the class J-^ as follows: 
define the hypersets 

Apj = {SGSe:f{x)>/3,xeS}, /?>0,/G^+ 

with corresponding hyperconcepts f!yf= ^A^ f{S), let 

^; = {/;^ : / G ^+} 

and 

K = (8) 
Hence, it suffices to compute the growth function r^ri^ ^m). 

Let us now begin to analyze the hyperclass By definition, T'^ is a class of indicator 
functions of subsets of S^. Denote by (^at C a collection of N such subsets. By a 
generalized collection we will mean a collection of subsets S <Z X with cardinality l^l < I. 
Henceforth we fix a value m and consider only collections 



Cat, such that ICtvI = m 



(9) 



where recall the definition of cardinality is according to Q. Let us denote the individual 
components of Cn by S'^^'^ G S^, 1 < J < hence 

C^ = {5«,...,5(^)}. 

The growth function may be expressed as 



/y^ (m) = max /jc-/ (Ca^) = max 

~' CjvC§<,|Civl=m ^ CivC§^,|Cjvl="i 



{[/'(5«),...,/'(5(^))]:/'g.F;}|. (10) 



Denote by s\^^ the i*'' element of the sample 5^-'^ based on the ordering of the elements of 
S'O) (which is induced by the ordering on X). Then 



I ( min /(x) > 7 ) , . . . , I ( min f{x) > 7 



ni(/(5«)>7),...,nii(/(5f))>7 



(11) 



Order the elements in each component of Qn by the underlying ordering on X. Then put the 
sets in lexical ordering starting with the first up to the l*"^ element. For instance, suppose 
m = 7, iV = 3, £ = 4 and 



C3 = { {2, 8, 9, 10}, {2, 5, 8, 9}, {3, 8, 10, 13}} 
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then the ordered version is 



{{2,5,8,9}, {2, 8,9,10}, {3,8,10,13}}. 

For any x E X let 

9}{x)^I{f{x)>^) (12) 
(we will sometimes write Of{x) for short). For any sample S^*^ of cardinality \S^^^\ > 1 let 

65(0 (/)= n^/(^f)- 

Then for (^a? we denote by 

vCmU) = [es(i)(/)>--->e5w(/)] 
where for brevity we sometimes write v{f). Let 



or simply V{(^i\f). Then from (11) we have 

r^{CN) = \V:f4CN)\. (13) 

Denote by X' the union 

TV 

U s^'^ = x' = {^^}T=i c X (14) 

and take the elements to be ordered as Xj < Xj+i, 1 < i < m — 1. The dependence of X' on 
Cn is left implicit. We will need the following procedure which maps Cn to a generalized 
collection. 



Procedure G: Given Ctv construct as follows: Let S^^^ = S^^\ For any 2 < i < N , let 

k=l 

Let N be the number of non-empty sets S^^\ 

Note that N may be smaller than N since there may be an element of Cat which is 
contained in the union of other elements of Cn. It is easy to verify by induction that the 
sets of are mutually exclusive and their union equals that of the original sets in Ctv. We 
have the following: 

Claim 1 \V^4Cn)\ < \V^4G{Cn))\. 
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Proof: We make repetitive use of the following: let A,BcX' be two non-empty sets and 
let C = B\ A. Then for any /, any b e {0, 1}, if [e^C/), esC/)] = [6,0], then [e^C/), ec(/)] 
may be either [6,0] or [6,1] since the elements in B which caused the product esif) to 
be zero may or may not also be in C. In the other case if [eyi(/)) e_B(/)] = [6,1] then 
[eA{f),ec{f)] = [b,l]. Hence 

\{[eA{f),eB{f)] : / G ^+}| < \{[eA{f), ec{f)] : / G ^+}| • 

The same argument holds also for multiple Ai, . . . , Ak, B and C = B \ UiLi^i- Let 
Cj^ = G{C]y). We now apply this to the following: 



I {[e5(i)(/)>es(2)(/), 6^(3) (/),..., e5.(iv)(/)] : / G 

= I { [65(1) (/), 65.(2) (/),ec;{3) (/),..., e5.(iv)(/)] :/e:r+}| 

I { [e^d) (/) , 65.(2) (/) , e^o) (/),..., egiN) (/)] : / G } | 



< 



^ I { [^SW (/) ' ^5(2) (/) ' ^5(3) (/) ' ^5(4) (/)•••, e5.(iv) (/)] : / G } I 



< 



< I { [65(1) (^)'e5(2)(^)' "25(3) (M. 6^(4) (/i),...,e5(jv)(/i)] :/G:r+}| 



(15) 
(16) 
(17) 

(18) 



where (15) follows since using G we have 5^^) = 5*-^^ (16) follows by applying the above 
with A = S^^\ B = 5(2) and C = S^^\ ^ follows by letting Ai = S^^\ A2 = 
5(2), B = 5(3), and C = ^(3). Finally, removing those sets S^^^ which are possibly 
empty leaves A^-dimensional vectors consisting only of the non-empty sets so (18) becomes 
I { [^sW (/),•••, e^(jv) (/)] : / G } I . 
Hence (11) is bounded from above as 



□ 



/>4(C7v) < \V:fAG{Cn))\ 



(19) 



Denote by N* = m — i + \ and define the following procedure which maps a generalized 
collection of sets in X to another. 



Procedure Q: Given a generalized collection (^^ = {S^'^^}fLi, 5^*-* C X . Construct Cat* as 
follows: let Y = IJiL2 ^^^^ ^^'^ the elements in Y be ordered according to their ordering 
on X' (we will refer to them as yi, y2, ■ ■ ■)■ Let S*'^^^ = S^^^^ For 2<i<m — l + l, let 

We now have the following: 



Claim 2 For any Cn C Si with \Cn\ =1^1, then 



V^^{G{Cn))\<\V^AQ{G{Cn)))\. 



Proof: Let (f^ = Q{G{C]\f)) and as before Cfj = G{Cn)- Note that by definition of Proce- 
dure Q, it follows that consists of iV = N* non-overlapping sets, the first S^^^ having 
cardinality £ and S^'^\ '2, < i < N , each having a single distinct element of X' . Their union 
satisfies U£i = X' . 
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Consider the sets VjF+(Cjv)i ^+(C7v) ^^"^ denote them simply by V and V. For any 
V G V consider the following subset of ^+ , 

B{v) = {/ G ^+ : v{f) = v} . 

We consider two types v gV. The first does not have the following property: there exist 
functions fa, fp S B{v) with 0j (x) 7^ ^/^(''^) least one element x S X'. Denote 

by 0j = . . . , 0j(xm)]- Then in this case all / G B{v) have the same 0j = 0, where 

(9 G {0, 1}'". This implies that 

= es(i)(/) = vi 

while for 2 < j < N we have 

where /c : [N*] — > [m] maps from the index of a (singleton) set S^^^ to the index of an 
element of X' and Ok{j) denotes the k{jY^ component of 9. Hence it follows that 

\'^B{v){Cn)\ = I^B(«)(CAr)l- 

Let the second type of v satisfy the complement condition, namely, there exist functions 
fai //3 ^ ^{^) with 6j (x) 7^ ^/^(^) least one point x G X'. If such x is an element of 

S^^^ then the first part of the argument above holds and we still have 

I^B(i))(Cjv)l = \^B{v)iCN)\- 

If however there is also such an x in some set 

S^^\ 2 < j < N then since the sets S^^\ 
2 < i < N are singletons then there exists some S^^^ C S^^^ with 

Hence for this second type of v we have 

|Fb(s)(C^)| > \VBii,)[Qj,)\. (20) 



Combining the above, then (20) holds for any v gV. 

Now, consider any two distinct Va, vp G V. Clearly, B{va)C\B{vp) = since every / 
has a unique v{f). Moreover, for any G B{va) and fh G B{vp) we have v{fa) 7^ ^^(/fe) for 
the following reason: there must exist some set S^^ and a point X G 5» such that ejjx) / 
^J^(x) (since ^q, 7^ -0/3). If i = 1 then they must differ on 5^"'^^ i.e., e^(i)(/ci) 7^ (//?)• 
If 2 < I < iV, then such an x is in some set S^^'^ C 5*^*' where 2 < j < N and therefore 
650) (/a) 7^ 650) (Z/?)- Hence no two distinct Uq,, ■0^ map to the same v. We therefore have 

\^MCn)\ = Ei^s(«)(Cjv)i 

< Y.\^Biv){CN)\ (21) 

= \VMCn)\ 
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where (21) follows from (20) which proves the claim. □ 

Note that by construction of Procedure Q, the dimensionality of the elements of {Q{G{Cn))) 
is A^*, i.e., m — i+1, which holds for any ^^r (even maximally overlapping) and X' as defined 
in (|9| and (14). Let us denote by Ca^ any set obtained by applying Procedure G on any 
collection followed by Procedure Q, i.e., 

C^*^{5*«,5*(2),...,s*(^')} 

with a set S*^^^ C X' of cardinality i and 

5*W = {xij, where Xi^ e X' \ S*^^\ k = 2,...,N*. 

Hence we have 



. ™ax r^iCN) < ^ max \V^+{Q{G{Cn) 



< max 

C]V:|C]vl=»" 



(22) 
(23) 



where (22) follows from (11), (13) and Claims [T] and [2] while (23) follows by definition of 
Ctv*- Now, 



\V^4Cn*)\ = \{[esHi){f),---,esH^*,{f)]:feT+}\ 
< 2\{[esH2){f),...,es,iNn{f)]:fe:F+}\ 



(24) 



where (24) follows trivially since e^,(i)(/) is binary. So from (23) we have 



. ^.F'(Civ)<2 max |{[e5.(2) (/),..., es*(iv) (/)]:/ G 



< 2 max 

xi,...,Xm-e'^X 



{[0}{xi),...,e}{x^.i)]:fGJ'+}\ 



(25) 



where xi, . . . , x^-i run over any m — i points in X. Define the following infinite class of 
binary functions on X by 

0;^ = {ejix) ■.fG^+} 

and for any finite subset 

X" = {xi,. . . ,Xm-i} C X 



let 



and 



9}{x")= e}{xi),...,e]{xm.,) 



o].^{x") = {e]{x"):fe:F+}. 

We proceed to bound \0],^{X")\. 

The class 0]r^ is in one-to-one correspondence with a class of sets Cf G X which 



are defined as 



Cf = {x:eVx) = l}, feJ'+. 
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We claim that any such set Cj equals the union of at most K = \_B/{2^)\ intervals. To see 
this, note that based on the general form of / G (see ([6| and ([7|) in order for f{x) > 7 
for every x in an interval set 3 C X then 3 must be contained in an interval set of the form 
(jsj and of length at least 27. Hence for any / G the corresponding set C/ is comprised 
of no more than K distinct intervals as 3. Hence the class is a subset of the class Ck 
of all sets that are comprised of the union of at most K subsets of X. A class H is said 
to shatter A if \{h^A '■ ^ ^ ^}| = 2^^. The Vapnik-Chervonenkis dimension of H, denoted as 
VC(Ti.), is defined as the cardinality of the largest set shattered by It is easy to show 
that the VC-dimension of Ck is YC{Ck) = 2i^. Hence it follows from the Sauer-Shelah 
lemma (see ?) that the growth of on any finite set X" C X of cardinality m — i (see 
Q) satisfies 



Since \0Zr 



(X")| = r^-f {X") then from m and (25) it follows that 



2LB/(27)J 



|r„,(m)|<2 Yl 



i=0 



m — i 



which proves the statement of the theorem. 



□ 
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